NLlex – a tool to generate lexical analyzers for natural language
نویسنده
چکیده
In this paper we present a natural language lexical analysis program generator (NLlex) that looks like Unix lex extended with morphological analysis and other Natural Language (NL) elements. NLlex generates a C program which is linked with a morphological analyzer and with other modules, in order to produce a NL processor. As a particular case, NLlex can generate modules to work: as a lexico-morphological analyzer (to be called from yacc, NLyacc, btyacc or any modules that need it) as a simple lexical processor tool NLlex can also deal with ,and be tuned to, the so frequently seen non textual elements (markup elements, LATEX like things, dates, quotes, ...) An interface between NLlex and Prolog have been developed and a Perl interface is under development.
منابع مشابه
A Computational Lexicon of Contemporary Hebrew
Computational lexicons are among the most important resources for natural language processing (NLP). Their importance is even greater in languages with rich morphology, where the lexicon is expected to provide morphological analyzers with enough information to enable them to correctly process intricately inflected forms. We describe the Haifa Lexicon of Contemporary Hebrew, the broadest-coverag...
متن کاملDesign and Implementation of an Intelligent Part of Speech Generator
The aim of this paper is to report on an attempt to design and implement an intelligent system capable of generating the correct part of speech for a given sentence while the sentence is totally new to the system and not stored in any database available to the system. It follows the same steps a normal individual does to provide the correct parts of speech using a natural language processor. It...
متن کاملGeneral Incremental Lexical Analysis
We present the first fully general approach to the problem of incremental lexical analysis. Our approach utilizes existing generators of (batch) lexical analyzers to derive the information needed by an incremental run-time system. No changes to the generator’s algorithms or run-time mechanism are required. The entire pattern language of the original tool is supported, including such features as...
متن کاملComparing Lexical Bundles in Hard Science Lectures; A Case of Native and Non-Native University Lecturers
Researchers stated that learning and applying certain set of lexical bundles of native lecturers by non-native lecturers would help students improve their proficiency through incidental vocabulary input. The present study shed light on the lexical bundles in hard science lectures used by Native and Non-native lecturers in international universities with the main purpose of analyzing the structu...
متن کاملProducing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations
The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007